This document is the summary of the R for Cognitive Psychologists workshop.
All correspondence related to this document should be addressed to:
Omid Ghasemi (Macquarie University, Sydney, NSW, 2109, AUSTRALIA)
Email: omidreza.ghasemi@hdr.mq.edu.au
Research Question
The aim of the study is to test if simple arguments are more effective in belief revision than more complex arguments. To that end, we present participants with an imaginary scenario (two alien creatures on a planet) and a theory (one creature is predator and the other one is prey) and we ask them to rate the likelihood truth of the theory based on a simple fact (We adapted this method from Gregg et al.,2017; see the original study here). Then, in a between-subject manipulation, participants will be presented with either 6 simple arguments (Modus Ponens conditionals) or 6 more complex arguments (Modus Tollens conditionals), and they will be asked to rate the likelihood truth of the initial theory on 7 stages.
The first stage is the base rating stage. The next three stages include supportive arguments of the theory and the last three arguments include disproving arguments of the theory. We hypothesized that the group with simple arguments shows better persuasion (as it reflects in higher ratings for the supportive arguments) and better dissuasion (as it reflects in lower ratings for the opposing arguments).
In the last part of the study, participants will be asked to answer several cognitive capacity/style measures including CRT, AOT-E, mindware, and numeracy scales. We hypothesized that cognitive ability, cognitive style, and open-mindedness are positive predictors of persuasion and dissuasion. These associations should be more pronounced for participants in the group with complex arguments because the ability and willingness to engage in deliberative thinking may favor participants to assess the underlying logical structure of those arguments. However, for participants in the simple group, the logical structure of arguments is more evident, so participants with lower ability can still assess the logical status of those arguments.

Thus, our hypotheses for this experiment are as follows:
Participants in the group with simple arguments have higher ratings for supportive arguments (They are more easily persuaded than those in the group with complex arguments).
Participants in the group with simple arguments have lower ratings for opposing arguments (They are more easily dissuaded than those in the group with complex arguments).
There are significant associations between CRT, AOT-E, Numeracy, and mindware with both persuasion and dissuasion indexes in each group and in the entire sample. The relationship between these measures should be stronger, although not significantly, for participants in the group with complex arguments.

Getting Ready
First, we need to design the experiment. For this experiment, we use online platforms for data collection. There are several options such as Gorilla, JSpsych, Qualtrics, psychoJS (pavlovia), etc. Since we do not need any reaction time data, we simply use Qualtrics. For an overview of different lab-based and online platforms, see here.
Next, we need to decide on the number of participants (sample size). For this study, we do not sue power analysis since we cannot access more than 120 participants. However, it is highly suggested calculate sample size using power estimation. You can find some nice tutorials on how to do that here, here, and here.
After we created the experiment and decided on the sample size, the next step is to preresigter the study. However, it would be better to do a pilot with 4 or 5 participants, clean all the data, do the desired analysis, and then pre-register the analysis and those codes. You can find the preregistration form for the current study here.
Finally, we need to restructure our project in a tidy folder with different sub-folders. Having a clean and tidy folder structure can save us! There are different formats of folder structure (for example, see here and here), but for now, we use the following structure:

Introduction to R
# load libraries
library(tidyverse)
library(here)
library(janitor)
library(broom)
library(afex)
library(emmeans)
library(knitr)
library(kableExtra)
library(ggsci)
library(patchwork)
library(skimr)
# install.packages("devtools")
# devtools::install_github("easystats/correlation")
library("correlation")
options(scipen=999) # turn off scientific notations
options(contrasts = c('contr.sum','contr.poly')) # set the contrast sum globally
options(knitr.kable.NA = '')
R can be used as a calculator. For mathematical purposes, be careful of the order in which R executes the commands.
10 + 10
## [1] 20
4 ^ 2
## [1] 16
(250 / 500) * 100
## [1] 50
R is a bit flexible with spacing (but no spacing in the name of variables and words)
10+10
## [1] 20
10 + 10
## [1] 20
R can sometimes tell that you’re not finished yet
10 +
How to create a variable? Variable assignment using <- and =. Note that R is case sensitive for everything
pay <- 250
month = 12
pay * month
## [1] 3000
salary <- pay * month
Few points in naming variables and vectors: use short, informative words, keep same method (e.g., not using capital words, use only _ or . ).
Function
Function is a set of statements combined together to perform a specific task. When we use a block of code repeatedly, we can convert it to a function. To write a function, first, you need to define it:
my_multiplier <- function(a,b){
result = a * b
return (result)
}
This code do nothing. To get a result, you need to call it:
my_multiplier (2,4)
## [1] 8
Fortunately, you do not need to write everything from scratch. R has lots of built-in functions that you can use:
round(54.6787)
## [1] 55
round(54.5787, digits = 2)
## [1] 54.58
Use ? before the function name to get some help. For example, ?round. You will see many functions in the rest of the workshop.
Basic Data Types in R:
function class() is used to show what is the type of a variable.
- Logical:
TRUE, FALSE can be abbreviated as T, F. They has to be capital, ‘true’ is not a logical data:
class(TRUE)
## [1] "logical"
class(F)
## [1] "logical"
- Numeric: all numbers e.g. 5, 10.5, 11,37; a special type of numeric is “integer” which is numbers without decimal. Integers are always numeric, but numeric is not always integer:
class(2)
## [1] "numeric"
class(13.46)
## [1] "numeric"
- Character: text for example, “I love R” or “4” or “4.5”:
class("ha ha ha ha")
## [1] "character"
class("56.6")
## [1] "character"
class("TRUE")
## [1] "character"
Can we change the type of data in a variable? Yes, you need to use the function as.---()
as.numeric(TRUE)
## [1] 1
as.character(4)
## [1] "4"
as.numeric("4.5")
## [1] 4.5
as.numeric("Hello")
## Warning: NAs introduced by coercion
## [1] NA
Data Structures in R
Vector: when there are more than one number or letter stored. Use the combine function c() for that.
sale <- c(1, 2, 3,4, 5, 6, 7, 8, 9, 10) # also sale <- c(1:10)
sale <- c(1:10)
sale * sale
## [1] 1 4 9 16 25 36 49 64 81 100
Subsetting a vector:
days <- c("Saturday", "Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
days[2]
## [1] "Sunday"
days[-2]
## [1] "Saturday" "Monday" "Tuesday" "Wednesday" "Thursday" "Friday"
days[c(2, 3, 4)]
## [1] "Sunday" "Monday" "Tuesday"
Exercise:
Create a vector named my_vector with numbers from 0 to 1000 in it:
my_vector <- (0:1000)
mean(my_vector)
## [1] 500
median(my_vector)
## [1] 500
min(my_vector)
## [1] 0
range(my_vector)
## [1] 0 1000
class(my_vector)
## [1] "integer"
sum(my_vector)
## [1] 500500
sd(my_vector)
## [1] 289.1081
List: allows you to gather a variety of objects under one name (that is, the name of the list) in an ordered way. These objects can be matrices, vectors, data frames, even other list.
my_list = list(sale, 1, 3, 4:7, "HELLO", "hello", FALSE)
my_list
## [[1]]
## [1] 1 2 3 4 5 6 7 8 9 10
##
## [[2]]
## [1] 1
##
## [[3]]
## [1] 3
##
## [[4]]
## [1] 4 5 6 7
##
## [[5]]
## [1] "HELLO"
##
## [[6]]
## [1] "hello"
##
## [[7]]
## [1] FALSE
Factor: Factors store the vector along with the distinct values of the elements in the vector as labels. The labels are always character irrespective of whether it is numeric or character. For example, variable gender with “male” and “female” entries:
gender <- c("male", "male", "male", " female", "female", "female")
gender <- factor(gender)
R now treats gender as a nominal (categorical) variable: 1=female, 2=male internally (alphabetically).
summary(gender)
## female female male
## 1 2 3
Question: why when we ran the above function i.e. summary(), it showed three and not two levels of the data? Hint: run ‘gender’.
gender
## [1] male male male female female female
## Levels: female female male
So, be careful of spaces!
Exercise:
Create a gender factor with 30 male and 40 females (Hint: use the rep() function):
gender <- c(rep("male",30), rep("female", 40))
gender <- factor(gender)
gender
## [1] male male male male male male male male male male
## [11] male male male male male male male male male male
## [21] male male male male male male male male male male
## [31] female female female female female female female female female female
## [41] female female female female female female female female female female
## [51] female female female female female female female female female female
## [61] female female female female female female female female female female
## Levels: female male
There are two types of categorical variables: nominal and ordinal. How to create ordered factors (when the variable is nominal and values can be ordered)? We should add two additional arguments to the factor() function: ordered = TRUE, and levels = c("level1", "level2"). For example, we have a vector that shows participants’ education level.
edu<-c(3,2,3,4,1,2,2,3,4)
education<-factor(edu, ordered = TRUE)
levels(education) <- c("Primary school","high school","College","Uni graduated")
education
## [1] College high school College Uni graduated
## [5] Primary school high school high school College
## [9] Uni graduated
## Levels: Primary school < high school < College < Uni graduated
Exercise:
We have a factor with patient and control values. Here, the first level is control and the second level is patient. Change the order of levels, so patient would be the first level:
health_status <- factor(c(rep('patient',5),rep('control',5)))
health_status
## [1] patient patient patient patient patient control control control
## [9] control control
## Levels: control patient
health_status_reordered <- factor(health_status, levels = c('patient','control'))
health_status_reordered
## [1] patient patient patient patient patient control control control
## [9] control control
## Levels: patient control
Finally, can you relabel both levels to uppercase characters? (Hint: check ?factor)
health_status_relabeled <- factor(health_status, levels = c('patient','control'), labels = c('Patient','Control'))
health_status_relabeled
## [1] Patient Patient Patient Patient Patient Control Control Control
## [9] Control Control
## Levels: Patient Control
Matrices: All columns in a matrix must have the same mode(numeric, character, etc.) and the same length. It can be created using a vector input to the matrix function.
my_matrix = matrix(c(1,2,3,4,5,6,7,8,9), nrow = 3, ncol = 3)
my_matrix
## [,1] [,2] [,3]
## [1,] 1 4 7
## [2,] 2 5 8
## [3,] 3 6 9
Data frames: (two-dimensional objects) can hold numeric, character or logical values. Within a column all elements have the same data type, but different columns can be of different data type. Let’s create a dataframe:
id <- 1:200
group <- c(rep("Psychotherapy", 100), rep("Medication", 100))
response <- c(rnorm(100, mean = 30, sd = 5),
rnorm(100, mean = 25, sd = 5))
my_dataframe <-data.frame(Patient = id,
Treatment = group,
Response = response)
We also could have done the below
my_dataframe <-data.frame(Patient = c(1:200),
Treatment = c(rep("Psychotherapy", 100), rep("Medication", 100)),
Response = c(rnorm(100, mean = 30, sd = 5),
rnorm(100, mean = 25, sd = 5)))
In large data sets, the function head() enables you to show the first observations of a data frames. Similarly, the function tail() prints out the last observations in your data set.
head(my_dataframe)
tail(my_dataframe)
|
Patient
|
Treatment
|
Response
|
|
1
|
Psychotherapy
|
25.72770
|
|
2
|
Psychotherapy
|
31.21325
|
|
3
|
Psychotherapy
|
29.99580
|
|
4
|
Psychotherapy
|
27.09983
|
|
5
|
Psychotherapy
|
26.81725
|
|
6
|
Psychotherapy
|
26.74217
|
|
|
Patient
|
Treatment
|
Response
|
|
195
|
195
|
Medication
|
20.40254
|
|
196
|
196
|
Medication
|
24.84327
|
|
197
|
197
|
Medication
|
22.79912
|
|
198
|
198
|
Medication
|
33.18292
|
|
199
|
199
|
Medication
|
24.58559
|
|
200
|
200
|
Medication
|
25.31449
|
Similar to vectors and matrices, brackets [] are used to selects data from rows and columns in data.frames:
my_dataframe[35, 3]
## [1] 28.7423
Exercise
How can we get all columns, but only for the first 10 participants?
my_dataframe[1:10, ]
|
Patient
|
Treatment
|
Response
|
|
1
|
Psychotherapy
|
25.72770
|
|
2
|
Psychotherapy
|
31.21325
|
|
3
|
Psychotherapy
|
29.99580
|
|
4
|
Psychotherapy
|
27.09983
|
|
5
|
Psychotherapy
|
26.81725
|
|
6
|
Psychotherapy
|
26.74217
|
|
7
|
Psychotherapy
|
34.86773
|
|
8
|
Psychotherapy
|
36.61779
|
|
9
|
Psychotherapy
|
34.68180
|
|
10
|
Psychotherapy
|
36.49156
|
How to get only the Response column for all participants?
my_dataframe[ , 3]
## [1] 25.72770 31.21325 29.99580 27.09983 26.81725 26.74217 34.86773
## [8] 36.61779 34.68180 36.49156 30.89007 35.41123 38.85842 30.57155
## [15] 30.42493 26.47939 27.50490 31.56233 37.30031 33.42363 29.32580
## [22] 32.88187 34.51232 23.88507 26.51493 25.42962 28.44067 29.23889
## [29] 25.24702 28.57276 30.52609 35.64100 25.68543 31.01235 28.74230
## [36] 25.07624 32.76672 38.24879 16.92928 24.42460 28.02411 34.69037
## [43] 30.40711 29.48666 37.96717 24.29767 32.64481 25.84578 29.28556
## [50] 44.11920 25.07969 21.43629 27.52244 37.24509 38.36706 35.46408
## [57] 23.45111 27.39520 23.59920 29.08243 32.88755 29.34568 36.76027
## [64] 30.98235 30.61806 35.48034 27.39239 32.46603 28.87644 33.11735
## [71] 28.45200 30.72661 27.58458 30.80198 26.08880 31.74715 34.42052
## [78] 36.14018 29.15979 15.75634 33.51822 37.88828 26.47385 33.26013
## [85] 34.78432 19.44502 31.93158 31.48073 31.48999 33.27916 28.62990
## [92] 29.38271 28.57686 30.40548 34.42423 29.16725 32.40763 37.41949
## [99] 28.13617 19.76279 25.79459 34.52733 26.44130 18.30816 27.64192
## [106] 17.47271 22.26139 24.29191 22.66851 15.63327 28.92695 20.81435
## [113] 21.88169 20.83022 33.60910 24.55989 17.62970 29.40357 30.38139
## [120] 24.33577 29.02207 16.05454 18.13544 34.35895 26.43107 20.91289
## [127] 17.87389 26.99069 25.12145 19.37052 20.88742 22.78649 13.54631
## [134] 33.21585 17.62122 25.56304 20.15252 27.47051 28.10522 24.03826
## [141] 29.80794 25.99564 21.25735 17.59239 27.46318 29.03501 30.01122
## [148] 33.74218 25.65373 26.46268 27.91304 27.24095 28.31243 22.22584
## [155] 31.01431 20.95776 28.17836 23.07360 20.12716 34.05968 22.96145
## [162] 15.87866 23.98782 19.77766 28.34237 30.83325 22.29114 24.69355
## [169] 12.81555 25.69261 23.22378 30.95828 24.23875 20.91039 32.04584
## [176] 25.42380 26.17715 25.99621 17.85196 23.01855 24.33618 25.42155
## [183] 16.25279 14.64445 22.04017 23.45330 22.37534 22.12577 24.22884
## [190] 27.32652 25.11790 18.05474 19.64780 25.75833 20.40254 24.84327
## [197] 22.79912 33.18292 24.58559 25.31449
Another easier way for selecting particular items is using their names that is more helpful than number of the rows in large data sets:
my_dataframe[ , "Response"]
# OR:
my_dataframe$Response
Data Cleaning
Now, suppose we tested 141 students. First, let’s read and check the uncleaned data:
# read the raw data
raw_data <- read_csv(here("raw_data","raw_argumentative_exp1.csv"))
head(raw_data)
|
end_date
|
status
|
ip_address
|
progress
|
duration_in_seconds
|
subject
|
recorded_date
|
response_id
|
location_latitude
|
location_longitude
|
distribution_channel
|
user_language
|
consent_form
|
age
|
gender
|
stage1_simple
|
stage2_simple
|
stage3_simple
|
stage4_simple
|
stage5_simple
|
stage6_simple
|
stage7_simple
|
stage1_complex
|
stage2_complex
|
stage3_complex
|
stage4_complex
|
stage5_complex
|
stage6_complex
|
stage7_complex
|
attention1
|
attention2
|
attention3
|
crt1
|
crt2
|
crt3
|
aote1
|
aote2
|
aote3
|
aote4
|
aote5
|
aote6
|
aote7
|
aote8
|
group
|
attention_correct
|
numeracy_total
|
mindware_total
|
|
24/9/20 22:02
|
IP Address
|
202.7.193.64
|
100
|
1517
|
subj1
|
24/9/20 22:02
|
R_1f298znjmVzcOjp
|
-33.85910
|
151.2002
|
anonymous
|
EN
|
I consent
|
18
|
Female
|
|
|
|
|
|
|
|
36
|
70
|
68
|
54
|
51
|
43
|
41
|
1
|
1
|
1
|
8
|
50
|
20
|
5
|
5
|
5
|
5
|
5
|
6
|
5
|
3
|
Complex
|
3
|
9
|
9
|
|
25/9/20 3:23
|
IP Address
|
220.245.220.94
|
100
|
1131
|
subj2
|
25/9/20 3:23
|
R_tL0A9P33Gi18I0N
|
-34.03680
|
150.6672
|
anonymous
|
EN
|
I consent
|
18
|
Male
|
50
|
55
|
55
|
90
|
75
|
50
|
35
|
|
|
|
|
|
|
|
1
|
1
|
1
|
8
|
10
|
39
|
6
|
6
|
6
|
5
|
6
|
6
|
5
|
6
|
Simple
|
3
|
9
|
10
|
|
27/9/20 22:59
|
IP Address
|
121.210.0.211
|
100
|
709
|
subj3
|
27/9/20 22:59
|
R_1LNyJhCKxTAAMOW
|
-33.85910
|
151.2002
|
anonymous
|
EN
|
I consent
|
19
|
Female
|
50
|
50
|
77
|
60
|
25
|
20
|
13
|
|
|
|
|
|
|
|
0
|
1
|
1
|
8
|
50
|
20
|
6
|
5
|
4
|
5
|
5
|
6
|
5
|
6
|
Simple
|
2
|
10
|
8
|
|
27/9/20 23:18
|
IP Address
|
58.179.100.109
|
100
|
949
|
subj4
|
27/9/20 23:18
|
R_3enxzUsEYgs5r1a
|
-12.63921
|
141.8741
|
anonymous
|
EN
|
I consent
|
27
|
Female
|
|
|
|
|
|
|
|
70
|
80
|
90
|
95
|
70
|
80
|
90
|
1
|
1
|
1
|
8
|
50
|
20
|
6
|
6
|
6
|
1
|
6
|
6
|
6
|
1
|
Complex
|
3
|
8
|
7
|
|
28/9/20 0:45
|
IP Address
|
120.154.53.68
|
100
|
1097
|
subj5
|
28/9/20 0:45
|
R_2Qzl2096a4KNE29
|
-33.85910
|
151.2002
|
anonymous
|
EN
|
I consent
|
19
|
Male
|
71
|
73
|
85
|
95
|
95
|
32
|
32
|
|
|
|
|
|
|
|
1
|
1
|
1
|
4
|
10
|
39
|
6
|
6
|
5
|
5
|
6
|
6
|
6
|
6
|
Simple
|
3
|
11
|
11
|
|
28/9/20 2:20
|
IP Address
|
1.129.107.6
|
100
|
880
|
subj6
|
28/9/20 2:20
|
R_esb71WOTQySjusF
|
-33.85910
|
151.2002
|
anonymous
|
EN
|
I consent
|
20
|
Female
|
|
|
|
|
|
|
|
89
|
100
|
44
|
55
|
100
|
50
|
55
|
1
|
1
|
1
|
8
|
50
|
20
|
6
|
6
|
6
|
5
|
6
|
6
|
6
|
6
|
Complex
|
3
|
10
|
10
|
Now, let’s do some cleanining using dplyr, tidyr and other tidyverse libraries. Finally, we will check the data:
cleaned_data <- raw_data %>%
filter(progress == 100) %>% # filter out unfinished participants
select(-end_date, -status,-ip_address, -duration_in_seconds, -recorded_date:-user_language) %>% #remove some useless columns
mutate(aote_total= aote1+aote2+aote3+aote4+aote5+aote6+aote7+aote8, # create a total score for our questionnaire
attentive= case_when(attention_correct >= 2~ 'Yes',T~ 'No')) %>%
mutate(crt1= case_when(crt1=='4'~ 1,T~0),
crt2= case_when(crt2=='10'~ 1,T~0),
crt3= case_when(crt3=='39'~ 1,T~0),
crt_total= crt1 + crt2 + crt3) %>%
select(-attention1:-aote8) %>%
pivot_longer(cols = c(stage1_simple:stage7_simple,stage1_complex:stage7_complex),names_to = 'stage',values_to = 'truth_estimate') %>% # make our dataframe long
#pivot_wider(names_from = stage, values_from= truth_estimate) # this code change our dataframe back to wide
filter(!is.na(truth_estimate)) %>% #remove rows with truth_estimate == NA
mutate(stage= gsub("_.*", "", stage)) %>%
rename(consent= consent_form) %>% # rename a column
#mutate_if(is.character, factor) %>%
mutate(subject= factor(subject), # convert all characters to factor
group = factor(group),
attentive = factor(attentive),
stage = factor(stage))
|
progress
|
subject
|
consent
|
age
|
gender
|
group
|
attention_correct
|
numeracy_total
|
mindware_total
|
aote_total
|
attentive
|
crt_total
|
stage
|
truth_estimate
|
|
100
|
subj1
|
I consent
|
18
|
Female
|
Complex
|
3
|
9
|
9
|
39
|
Yes
|
0
|
stage1
|
36
|
|
100
|
subj1
|
I consent
|
18
|
Female
|
Complex
|
3
|
9
|
9
|
39
|
Yes
|
0
|
stage2
|
70
|
|
100
|
subj1
|
I consent
|
18
|
Female
|
Complex
|
3
|
9
|
9
|
39
|
Yes
|
0
|
stage3
|
68
|
|
100
|
subj1
|
I consent
|
18
|
Female
|
Complex
|
3
|
9
|
9
|
39
|
Yes
|
0
|
stage4
|
54
|
|
100
|
subj1
|
I consent
|
18
|
Female
|
Complex
|
3
|
9
|
9
|
39
|
Yes
|
0
|
stage5
|
51
|
|
100
|
subj1
|
I consent
|
18
|
Female
|
Complex
|
3
|
9
|
9
|
39
|
Yes
|
0
|
stage6
|
43
|
Ok, now the data is clean and tidy which means:
- Each variable forms a column.
- Each observation forms a row.
- Each type of observational unit forms a table (Wickham, 2014).
Check the dataframe and all the data types:
str(cleaned_data)
## tibble [917 × 14] (S3: tbl_df/tbl/data.frame)
## $ progress : num [1:917] 100 100 100 100 100 100 100 100 100 100 ...
## $ subject : Factor w/ 131 levels "subj1","subj10",..: 1 1 1 1 1 1 1 45 45 45 ...
## $ consent : chr [1:917] "I consent" "I consent" "I consent" "I consent" ...
## $ age : num [1:917] 18 18 18 18 18 18 18 18 18 18 ...
## $ gender : chr [1:917] "Female" "Female" "Female" "Female" ...
## $ group : Factor w/ 2 levels "Complex","Simple": 1 1 1 1 1 1 1 2 2 2 ...
## $ attention_correct: num [1:917] 3 3 3 3 3 3 3 3 3 3 ...
## $ numeracy_total : num [1:917] 9 9 9 9 9 9 9 9 9 9 ...
## $ mindware_total : num [1:917] 9 9 9 9 9 9 9 10 10 10 ...
## $ aote_total : num [1:917] 39 39 39 39 39 39 39 46 46 46 ...
## $ attentive : Factor w/ 2 levels "No","Yes": 2 2 2 2 2 2 2 2 2 2 ...
## $ crt_total : num [1:917] 0 0 0 0 0 0 0 2 2 2 ...
## $ stage : Factor w/ 7 levels "stage1","stage2",..: 1 2 3 4 5 6 7 1 2 3 ...
## $ truth_estimate : num [1:917] 36 70 68 54 51 43 41 50 55 55 ...
Finally, we save our data to the cleaned_data folder.
write_csv(cleaned_data, here("cleaned_data","argumentative_exp1.csv"))
Descriptive Statistics
Note: All the data that we use here is manipulated (fabricated) for teaching purpuses. In our study, we failed to find such beautiful and interesting results.
Now, let’s do some descriptive statistics. First, we can open a new script called analysis_exp1.r and read the cleaned data again.
data_exp1 <- read_csv(here("cleaned_data","argumentative_exp1.csv"))
How many participants in total?
data_exp1 %>% summarise(n= n_distinct(subject))
how many participants in each group?
data_exp1 %>%
group_by(subject) %>%
filter(row_number()==1) %>%
ungroup () %>%
group_by(group) %>%
count()
|
group
|
n
|
|
Complex
|
65
|
|
Simple
|
66
|
Find the mean and sd for numeric variables using base R summary function:
data_exp1 %>%
group_by(subject) %>%
filter(row_number()==1) %>%
ungroup () %>%
summary()
## progress subject consent age
## Min. :100 Length:131 Length:131 Min. :16.00
## 1st Qu.:100 Class :character Class :character 1st Qu.:18.00
## Median :100 Mode :character Mode :character Median :19.00
## Mean :100 Mean :21.15
## 3rd Qu.:100 3rd Qu.:20.00
## Max. :100 Max. :63.00
## gender group attention_correct numeracy_total
## Length:131 Length:131 Min. :0.000 Min. : 0.000
## Class :character Class :character 1st Qu.:2.000 1st Qu.: 8.000
## Mode :character Mode :character Median :3.000 Median :10.000
## Mean :2.573 Mean : 8.779
## 3rd Qu.:3.000 3rd Qu.:10.000
## Max. :3.000 Max. :11.000
## mindware_total aote_total attentive crt_total
## Min. : 4.00 Min. :19.0 Length:131 Min. :0.0000
## 1st Qu.: 8.00 1st Qu.:33.5 Class :character 1st Qu.:0.0000
## Median : 8.00 Median :39.0 Mode :character Median :0.0000
## Mean : 8.45 Mean :38.2 Mean :0.8092
## 3rd Qu.: 9.00 3rd Qu.:43.0 3rd Qu.:1.0000
## Max. :12.00 Max. :48.0 Max. :3.0000
## stage truth_estimate
## Length:131 Min. : 0.00
## Class :character 1st Qu.: 43.50
## Mode :character Median : 61.00
## Mean : 57.09
## 3rd Qu.: 75.50
## Max. :100.00
Alternatively, we can use base Rsummaryfunctionskimr` library:
data_exp1 %>%
group_by(subject) %>%
filter(row_number()==1) %>%
ungroup () %>%
dplyr::select (age, numeracy_total, mindware_total, aote_total, crt_total) %>%
skimr::skim()
|
skim_type
|
skim_variable
|
n_missing
|
complete_rate
|
numeric.mean
|
numeric.sd
|
numeric.p0
|
numeric.p25
|
numeric.p50
|
numeric.p75
|
numeric.p100
|
numeric.hist
|
|
numeric
|
age
|
0
|
1
|
21.1526718
|
6.515630
|
16
|
18.0
|
19
|
20
|
63
|
▇▁▁▁▁
|
|
numeric
|
numeracy_total
|
0
|
1
|
8.7786260
|
2.274576
|
0
|
8.0
|
10
|
10
|
11
|
▁▁▁▂▇
|
|
numeric
|
mindware_total
|
0
|
1
|
8.4503817
|
1.683466
|
4
|
8.0
|
8
|
9
|
12
|
▁▅▇▆▃
|
|
numeric
|
aote_total
|
0
|
1
|
38.1984733
|
6.153698
|
19
|
33.5
|
39
|
43
|
48
|
▁▂▇▇▆
|
|
numeric
|
crt_total
|
0
|
1
|
0.8091603
|
1.038598
|
0
|
0.0
|
0
|
1
|
3
|
▇▃▁▂▂
|
Exercise
For this exercise, we use a dataset of one of my own studies. In this study, we asked participants to guess the physical brightness of reasoning arguments and then we gave a cognitive ability test. (See the original study here). Open ghasemi_brightness_exp4.csv file and answer to the following questions:
- How many participants did we test in total?
- Find out how many male and female we tested.
- Calculate mean and sd for age and cognitive ability (
ah4).
ghasemi_data <- read_csv(here("cleaned_data","ghasemi_brightness_exp4.csv"))
ghasemi_data %>% summarise(n = n_distinct(participant)) # number of participants:200
ghasemi_data %>% group_by (participant) %>% filter (row_number()==1) %>% group_by (gender) %>% summarise(n= n()) %>% ungroup() # 183 female, 17 male
|
gender
|
n
|
|
Female
|
183
|
|
Male
|
17
|
ghasemi_data %>% dplyr::select (age, ah4) %>% skimr::skim() # mean and sd for age and cognitive ability
Data summary
|
|
|
|
Name
|
Piped data
|
|
Number of rows
|
38400
|
|
Number of columns
|
2
|
|
_______________________
|
|
|
Column type frequency:
|
|
|
numeric
|
2
|
|
________________________
|
|
|
Group variables
|
|
Variable type: numeric
|
skim_variable
|
n_missing
|
complete_rate
|
mean
|
sd
|
p0
|
p25
|
p50
|
p75
|
p100
|
hist
|
|
age
|
0
|
1
|
22.20
|
6.78
|
17
|
19
|
20
|
22
|
52
|
▇▁▁▁▁
|
|
ah4
|
0
|
1
|
39.55
|
9.46
|
11
|
34
|
40
|
46
|
61
|
▁▃▇▆▂
|
Data Visualization
First, we need to create a dataset with aggregated truth estimate scores over group and stage. We will use this dataset for line and bar graphs.
aggregated_data_exp1 <- data_exp1 %>%
group_by(stage, group) %>%
mutate(truth_estimate = mean(truth_estimate)) %>%
ungroup()
barplot_exp1 <- aggregated_data_exp1 %>%
ggplot(aes(x=stage, y= truth_estimate, fill=group)) +
geom_bar(stat = "identity", position= "dodge")+
# stat_summary(fun= mean, geom = "bar", position = "dodge")+ # can be used instead of geom_bar() for long dataframes
labs (x= '', y= "Truth Likelihhod Estimate") +
theme_bw() +
scale_fill_jama()
barplot_exp1

barplot_facet_exp1 <- aggregated_data_exp1 %>%
ggplot(aes(x=group, y= truth_estimate, fill=stage)) +
geom_bar(stat = "identity", position= "dodge")+
labs (x= '', y= "Truth Likelihhod Estimate") +
theme_bw() +
theme(legend.position = "none",
axis.text=element_text(size=11),
axis.title = element_text(size = 12)) +
facet_wrap(~stage)+
scale_fill_jco()
barplot_facet_exp1

lineplot_exp1 <- aggregated_data_exp1 %>%
ggplot(aes(x=factor(stage), y= truth_estimate, group= group, color= group)) +
geom_line(aes(linetype= group)) +
geom_point(size= 5)+
labs (x= '', y= "Truth Likelihhod Estimate") +
theme_classic() +
theme(legend.position = "bottom",
axis.text=element_text(size=11),
axis.title = element_text(size = 12)) +
scale_color_nejm()
lineplot_exp1

violinplot_exp1 <- data_exp1 %>%
ggplot(aes(x=factor(stage), y= truth_estimate, fill= group)) +
geom_violin()+
labs (x= '', y= "Truth Likelihhod Estimate") +
theme_bw() +
theme(legend.position = "bottom",
axis.text=element_text(size=11),
axis.title = element_text(size = 12)) +
scale_fill_d3()
violinplot_exp1

boxplot_exp1 <- data_exp1 %>%
ggplot(aes(x=factor(stage), y= truth_estimate, fill= group)) +
geom_boxplot()+
#geom_point(position = position_dodge(width=0.75), alpha= .5)+
labs (x= '', y= "Truth Likelihhod Estimate") +
theme_bw() +
theme(legend.position = "bottom",
axis.text=element_text(size=11),
axis.title = element_text(size = 12)) +
scale_fill_simpsons()
boxplot_exp1

boxplot_facet_exp1 <- data_exp1 %>%
ggplot(aes(x=factor(stage), y= truth_estimate, fill= group)) +
geom_boxplot()+
labs (x= '', y= "Truth Likelihhod Estimate") +
theme_bw() +
theme(legend.position = "bottom",
axis.text=element_text(size=11),
axis.title = element_text(size = 12),
axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
facet_wrap(~group)+
scale_color_simpsons()
boxplot_facet_exp1

How to combine multiple plots? We can use the patchwork package. A nice tutorial on using this package can be found here
combined_plot_exp1 <- (barplot_facet_exp1+lineplot_exp1) / (violinplot_exp1+boxplot_exp1)
combined_plot_exp1

How to save a plot?
ggsave(combined_plot_exp1, filename = here("outputs","combined_plot_exp1.png"), dpi=300)
Data Analysis
t-test
Is there a difference between groups at the first stage? Ideally, we want participants’ ratings at the first stage be similar for both groups because we have not done any manipulations. Previous graphs showed us that ratings of simple and complex group at this stage are pretty close. Let’s test that using an independent t-test (because we have 2 independent groups):
# Is there a difference between groups at the first stage?
data_exp1 %>%
group_by(group) %>%
filter(stage=='stage1') %>%
ungroup () %>%
t.test(truth_estimate~group, data = ., paired=FALSE)
##
## Welch Two Sample t-test
##
## data: truth_estimate by group
## t = -0.75145, df = 104.95, p-value = 0.4541
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.883716 5.351781
## sample estimates:
## mean in group Complex mean in group Simple
## 55.44615 58.71212
Now, we wonder if opposing arguments were effective at all, regardless of participants’ group. So, we would like to test if ratings at the final stage are lower than ratings at the stage 4? Since a pair of score at stage 4 and stage 7 is coming from a same person, we use paired t-test.
# Is there a difference between ratings of stage4 and stage7?
data_exp1 %>%
filter(stage=='stage4' | stage=='stage7') %>%
ungroup () %>%
t.test(truth_estimate~stage, data = ., paired=TRUE)
##
## Paired t-test
##
## data: truth_estimate by stage
## t = 12.788, df = 130, p-value < 0.00000000000000022
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 32.64368 44.59296
## sample estimates:
## mean of the differences
## 38.61832
Exercise
John et al. (2019) investigated the consequences of backing down (changing one’s mind in lights of evidence)and how other people view someone who change their mind. In their second experiments, they presented participants either with a person who changes their mind or a person who refuses to back down. Then, they asked participants to rate how intelligent and confident the person is (See the original study here). They reported that:
“Relative to the entrepreneur who did not back down, participants judged the entrepreneur who backed down as more intelligent (M_backed_down=5.13 out of 7, SD=1.09; M_did_not_back_down=3.97, SD=1.54; t(271.12)=−7.59, p < .001) but less confident (M_backed_down=4.50 out of 7, SD=1.36; M_did_not_back_down=5.65, SD=1.10; t(291.01)=8.08, p < .001).”.
Open the john_backdown_exp2.csv file and try to reproduce their results. Run two separate independent t-test, one with intelligent as the dependent variable and one with confident as the dependent variable. For both t-test, use back_down as the between-subject independent variable.
john_data <- read_csv(here("cleaned_data","john_backdown_exp2.csv"))
t.test(intelligent~back_down, data = john_data, paired=FALSE)
##
## Welch Two Sample t-test
##
## data: intelligent by back_down
## t = 7.5853, df = 271.12, p-value = 0.0000000000005319
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.8577107 1.4590076
## sample estimates:
## mean in group backed_down mean in group did_not_back_down
## 5.129412 3.971053
t.test(confident~back_down, data = john_data, paired=FALSE)
##
## Welch Two Sample t-test
##
## data: confident by back_down
## t = -8.0763, df = 291.01, p-value = 0.00000000000001787
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.4257768 -0.8670294
## sample estimates:
## mean in group backed_down mean in group did_not_back_down
## 4.503268 5.649671
Analysis of Variance (ANOVA)
Now, let’s answer our main question: Do participants in the simple group show higher ratings for supportive arguments (stage 2 to 4) and lower ratings for opposing arguments (stage 5 to 7), compared to participants in the complex group? If this is the case. we expect an interaction in the traditional Analysis of Variance (AONVA) test.
aov_m1 <- aov_car (truth_estimate ~ group*stage +
Error(subject/stage), data = data_exp1)
|
Effect
|
df
|
MSE
|
F
|
ges
|
p.value
|
|
group
|
1, 129
|
949.04
|
0.01
|
<.0001
|
.94
|
|
stage
|
4.45, 574.05
|
515.69
|
59.48 ***
|
.25
|
<.0001
|
|
group:stage
|
4.45, 574.05
|
515.69
|
13.34 ***
|
.07
|
<.0001
|
As you can see, we found a significant main effect of stage and a significant group by stage interaction. We can use the emmeans package to do post-hoc tests.
# main effect of stage
emmeans(aov_m1, 'stage')
## stage emmean SE df lower.CL upper.CL
## stage1 57.1 1.88 763 53.4 60.8
## stage2 66.8 1.88 763 63.1 70.5
## stage3 74.6 1.88 763 70.9 78.3
## stage4 79.6 1.88 763 75.9 83.3
## stage5 62.4 1.88 763 58.7 66.1
## stage6 52.5 1.88 763 48.9 56.2
## stage7 41.1 1.88 763 37.4 44.8
##
## Results are averaged over the levels of: group
## Warning: EMMs are biased unless design is perfectly balanced
## Confidence level used: 0.95
pairs(emmeans(aov_m1, 'stage'), adjust= 'holm')
## contrast estimate SE df t.ratio p.value
## stage1 - stage2 -9.74 2.42 774 -4.031 0.0004
## stage1 - stage3 -17.53 2.42 774 -7.256 <.0001
## stage1 - stage4 -22.50 2.42 774 -9.311 <.0001
## stage1 - stage5 -5.29 2.42 774 -2.187 0.1160
## stage1 - stage6 4.53 2.42 774 1.876 0.1220
## stage1 - stage7 15.98 2.42 774 6.613 <.0001
## stage2 - stage3 -7.79 2.42 774 -3.225 0.0066
## stage2 - stage4 -12.76 2.42 774 -5.280 <.0001
## stage2 - stage5 4.46 2.42 774 1.844 0.1220
## stage2 - stage6 14.28 2.42 774 5.908 <.0001
## stage2 - stage7 25.72 2.42 774 10.644 <.0001
## stage3 - stage4 -4.97 2.42 774 -2.055 0.1206
## stage3 - stage5 12.25 2.42 774 5.069 <.0001
## stage3 - stage6 22.07 2.42 774 9.132 <.0001
## stage3 - stage7 33.51 2.42 774 13.869 <.0001
## stage4 - stage5 17.22 2.42 774 7.124 <.0001
## stage4 - stage6 27.04 2.42 774 11.188 <.0001
## stage4 - stage7 38.48 2.42 774 15.924 <.0001
## stage5 - stage6 9.82 2.42 774 4.064 0.0004
## stage5 - stage7 21.27 2.42 774 8.800 <.0001
## stage6 - stage7 11.45 2.42 774 4.736 <.0001
##
## Results are averaged over the levels of: group
## P value adjustment: holm method for 21 tests
# group by stage interaction
emmeans(aov_m1, "group", by= "stage")
## stage = stage1:
## group emmean SE df lower.CL upper.CL
## Complex 55.4 2.67 766 50.2 60.7
## Simple 58.7 2.65 761 53.5 63.9
##
## stage = stage2:
## group emmean SE df lower.CL upper.CL
## Complex 63.3 2.67 766 58.1 68.6
## Simple 70.3 2.65 761 65.1 75.5
##
## stage = stage3:
## group emmean SE df lower.CL upper.CL
## Complex 70.0 2.67 766 64.7 75.2
## Simple 79.3 2.65 761 74.1 84.5
##
## stage = stage4:
## group emmean SE df lower.CL upper.CL
## Complex 71.6 2.67 766 66.3 76.8
## Simple 87.6 2.65 761 82.4 92.8
##
## stage = stage5:
## group emmean SE df lower.CL upper.CL
## Complex 64.2 2.67 766 58.9 69.4
## Simple 60.5 2.65 761 55.3 65.8
##
## stage = stage6:
## group emmean SE df lower.CL upper.CL
## Complex 57.9 2.67 766 52.7 63.2
## Simple 47.2 2.65 761 41.9 52.4
##
## stage = stage7:
## group emmean SE df lower.CL upper.CL
## Complex 51.1 2.67 766 45.9 56.4
## Simple 31.1 2.65 761 25.9 36.3
##
## Warning: EMMs are biased unless design is perfectly balanced
## Confidence level used: 0.95
update(pairs(emmeans(aov_m1, "group", by= "stage")), by = NULL, adjust = "holm")
## contrast stage estimate SE df t.ratio p.value
## Complex - Simple stage1 -3.27 3.76 763 -0.868 0.6673
## Complex - Simple stage2 -6.96 3.76 763 -1.851 0.1935
## Complex - Simple stage3 -9.29 3.76 763 -2.469 0.0550
## Complex - Simple stage4 -16.02 3.76 763 -4.259 0.0001
## Complex - Simple stage5 3.64 3.76 763 0.967 0.6673
## Complex - Simple stage6 10.79 3.76 763 2.868 0.0213
## Complex - Simple stage7 20.08 3.76 763 5.337 <.0001
##
## P value adjustment: holm method for 7 tests
You can use the afex_plot function from afex to create beautiful plots. Those plots interacts nicely with ggplot:
afex_plot(aov_m1, x = "stage", trace = "group", error='between',
line_arg = list(size=1),
point_arg = list(size=3.5),
data_arg = list(size= 1, color= 'grey', width=.4),
data_geom = geom_boxplot,
mapping = c("linetype", "shape", "fill"),
legend_title = "Group") +
labs(y = "Truth Likelihhod Estimate", x = "") +
theme_bw()+ # remove the grey background and grid
theme(axis.text=element_text(size=13),
axis.title = element_text(size = 13),
legend.text=element_text(size=13),
legend.title=element_text(size=13),
legend.position='bottom',
legend.key.size = unit(1, "cm"),
legend.background = element_rect(colour = 'black', fill = 'white', linetype='solid'))+
scale_color_simpsons() +
scale_fill_simpsons()

If you are interested in this topic, check out this nice tutorial about using afex to run ANOVA, and also this interesting tutorial on the emmeans package.
Exercise
Rotello et al. (2018) investigated the association between the race (White vs. Black faces) and the gun-tool judgments. In their first experiments, they presented participants with 16 White male faces and 16 Black male faces, and following that 8 images of guns and 8 images of tools. They asked participants to judge if the object is a tool or a gun by pressing keyboard buttons. Then, they ran an ANOVA to see if participants’ gun responses are higher for any of the races. So, they included prime race (Black, White) and target identity (gun, tool) as independent variables and participants’ gun responses as dependent variable into their linear model (See the original study here). They found that:
“Participants made more gun responses to guns than to tools, F(1,45) = 53243, p < 0.0001, η2g = 0.998. However, the race of the prime face did not matter, F(1,45) = 0.287, p > 0.59, η2g = 0.001, nor was there an interaction of prime race with target object, F(1,45) = 0.022, p > 0.88, η2g = 0.000)”.
Open the rotello_shooter_exp1.csv file and try to reproduce their results. Run an ANOVA (type III) with resp as the dependent variable and target, prime, and their interaction as independent variables.
# load the general data file
rotello_data <- read_csv(here("cleaned_data","rotello_shooter_exp1.csv"))
# ANOVA
rotello_aov <- aov_car (resp ~ target*prime +
Error(subject/target*prime), data = rotello_data)
|
Effect
|
df
|
MSE
|
F
|
ges
|
p.value
|
|
target
|
1, 45
|
0.00
|
53242.99 ***
|
>.99
|
<.0001
|
|
prime
|
1, 45
|
0.00
|
0.29
|
.001
|
.59
|
|
target:prime
|
1, 45
|
0.00
|
0.02
|
<.0001
|
.88
|
Correlation
Now, let’s answer to another question of this study: does persuasion and dissuasion is related to open-mindedness, cognitive ability, reasoning abilities, and cognitive style? To answer this question, we need to create two indexes (scores) one for persuasion and one for dissuasion. Then we can do a correlation test:
cor_data_exp1 <- data_exp1 %>%
pivot_wider(names_from = stage, values_from = truth_estimate) %>%
group_by(subject) %>%
mutate(persuasion_index= stage2+ stage3+ stage4 - stage1,
dissuasion_index= (101-stage5) + (101-stage6) + (101-stage7) - (101-stage4)) %>%
ungroup()%>%
dplyr::select(persuasion_index,dissuasion_index,aote_total,numeracy_total,crt_total,mindware_total)
#---------- Base R:
cor(cor_data_exp1, method = "pearson", use = "complete.obs")
#---------- Psych library:
cor_data_exp1 %>%
psych::pairs.panels(method = "pearson", hist.col = "#00AFBB", density = T, ellipses = F, stars = T)
#---------- Correlation library:
correlation::correlation(cor_data_exp1) %>% summary()
#---------- apaTables library:
cor_data_exp1 %>%
apaTables::apa.cor.table(filename="./outputs/CorMatrix.doc", show.conf.interval=T)
|
|
persuasion_index
|
dissuasion_index
|
aote_total
|
numeracy_total
|
crt_total
|
mindware_total
|
|
persuasion_index
|
1.00
|
0.26
|
0.25
|
0.16
|
0.16
|
0.11
|
|
dissuasion_index
|
0.26
|
1.00
|
-0.03
|
-0.03
|
-0.09
|
0.15
|
|
aote_total
|
0.25
|
-0.03
|
1.00
|
0.40
|
0.26
|
0.11
|
|
numeracy_total
|
0.16
|
-0.03
|
0.40
|
1.00
|
0.44
|
0.15
|
|
crt_total
|
0.16
|
-0.09
|
0.26
|
0.44
|
1.00
|
0.29
|
|
mindware_total
|
0.11
|
0.15
|
0.11
|
0.15
|
0.29
|
1.00
|
|
Parameter
|
mindware_total
|
crt_total
|
numeracy_total
|
aote_total
|
dissuasion_index
|
|
persuasion_index
|
0.11
|
0.16
|
0.16
|
0.25
|
0.26
|
|
dissuasion_index
|
0.15
|
-0.09
|
-0.03
|
-0.03
|
|
|
aote_total
|
0.11
|
0.26
|
0.40
|
|
|
|
numeracy_total
|
0.15
|
0.44
|
|
|
|
|
crt_total
|
0.29
|
|
|
|
|
Exercise
Pennycook et al. (2020) investigated the relationship between actively open-minded thinking style about evidence (AOT-E) and different political, scientific, and religious beliefs (see the original paper here). In their first experiment, they calculated the correlation of AOTE and scientific beliefs items (global warming, evolution, etc.) and they found the following results:
Open the pennycook_aote_exp1.csv file and try to reproduce their results by creating the same correlation matrix.
pennycook_data <- read_csv(here("cleaned_data","pennycook_aote_exp1.csv"))
#---------- Base R:
cor(pennycook_data, method = "pearson", use = "complete.obs")
#---------- Psych library:
pennycook_data %>%
psych::pairs.panels(method = "pearson", hist.col = "#00AFBB", density = T, ellipses = F, stars = T)
#---------- Correlation library:
correlation::correlation(pennycook_data) %>% summary()
#---------- apaTables library:
pennycook_data %>%
apaTables::apa.cor.table(filename="./outputs/CorMatrix.doc", show.conf.interval=T)
|
Parameter
|
trust_scien
|
gm_health
|
tech_problems
|
modern_medicine
|
old_earth
|
vaccines
|
stem_cell
|
big_bang
|
evolution
|
global_warming
|
|
aote
|
0.35
|
0.36
|
0.44
|
0.33
|
0.40
|
0.47
|
0.45
|
0.51
|
0.51
|
0.37
|
|
global_warming
|
0.42
|
0.06
|
0.14
|
0.18
|
0.33
|
0.26
|
0.31
|
0.33
|
0.38
|
|
|
evolution
|
0.48
|
0.33
|
0.28
|
0.36
|
0.47
|
0.39
|
0.54
|
0.78
|
|
|
|
big_bang
|
0.49
|
0.37
|
0.28
|
0.36
|
0.45
|
0.37
|
0.54
|
|
|
|
|
stem_cell
|
0.47
|
0.34
|
0.36
|
0.47
|
0.40
|
0.40
|
|
|
|
|
|
vaccines
|
0.43
|
0.52
|
0.49
|
0.53
|
0.38
|
|
|
|
|
|
|
old_earth
|
0.29
|
0.24
|
0.21
|
0.33
|
|
|
|
|
|
|
|
modern_medicine
|
0.43
|
0.42
|
0.47
|
|
|
|
|
|
|
|
|
tech_problems
|
0.33
|
0.39
|
|
|
|
|
|
|
|
|
|
gm_health
|
0.31
|
|
|
|
|
|
|
|
|
|
Linear Regression
In the previous section, we found that open-mindedness (AOT-E) is correlated with persuasion. Now, one may ask if open-mindedness can predict persuasion after controlling for reasoning and controlling abilities? To answer that, we can run a multiple regression analysis:
exp1_reg=lm(persuasion_index ~ aote_total+ numeracy_total+ crt_total+ mindware_total,
data=cor_data_exp1)
|
term
|
estimate
|
std.error
|
statistic
|
p.value
|
|
(Intercept)
|
78.57
|
33.08
|
2.38
|
0.02
|
|
aote_total
|
1.62
|
0.72
|
2.23
|
0.03
|
|
numeracy_total
|
0.72
|
2.11
|
0.34
|
0.73
|
|
crt_total
|
3.09
|
4.51
|
0.68
|
0.49
|
|
mindware_total
|
1.77
|
2.52
|
0.70
|
0.48
|
Exercise
Trémolière and Djeriouat (2020) examined the role of cognitive reflection and belief in science in climate change skepticism. In their first study, they revealed that cognitive reflection and belief in science negetively predicted climate change skepticism even after controlling for demographic and cognitive ability variables (see the original paper here).
Open the tremoliere_data_exp1.csv file and try to reproduce their results by running a multiple linear regression. Enter age, gender, education, belief in science, literacy, numeracy (Numtotal), and cognitive reflection as predictors and enter climate change skepticism (climato) as the outcome variable.
Tremoliere_data <- read_csv(here("cleaned_data","tremoliere_data_exp1.csv"))
Tremoliere_reg=lm(Climato ~ Age+ Gender+ Education+ BeliefInSciencetotal+ Literacy+ Numtotal+ CognitiveReflection,
data=Tremoliere_data)
|
term
|
estimate
|
std.error
|
statistic
|
p.value
|
|
(Intercept)
|
57.57
|
5.19
|
11.09
|
0.00
|
|
Age
|
0.01
|
0.05
|
0.24
|
0.81
|
|
Gender
|
-5.68
|
1.34
|
-4.23
|
0.00
|
|
Education
|
0.54
|
0.38
|
1.43
|
0.15
|
|
BeliefInSciencetotal
|
-0.20
|
0.06
|
-3.62
|
0.00
|
|
Literacy
|
-0.49
|
0.51
|
-0.96
|
0.34
|
|
Numtotal
|
-1.52
|
0.83
|
-1.82
|
0.07
|
|
CognitiveReflection
|
-18.58
|
4.26
|
-4.37
|
0.00
|
|
r.squared
|
adj.r.squared
|
sigma
|
statistic
|
p.value
|
df
|
logLik
|
AIC
|
BIC
|
deviance
|
df.residual
|
nobs
|
|
0.19
|
0.17
|
12.65
|
11.91
|
0
|
7
|
-1467.77
|
2953.54
|
2988.81
|
58235.89
|
364
|
372
|
Rmarkdown
To be completed…
References
Ghasemi, O., Handley, S., & Howarth, S. (2020). The Bright Homunculus in our Head: Individual Differences in Intuitive Sensitivity to Logical Validity.
John, L. K., Jeong, M., Gino, F., & Huang, L. (2019). The self-presentational consequences of upholding one’s stance in spite of the evidence. Organizational Behavior and Human Decision Processes, 154, 1-14.
Pennycook, G., Cheyne, J. A., Koehler, D. J., & Fugelsang, J. A. (2020). On the belief that beliefs should change according to evidence: Implications for conspiratorial, moral, paranormal, political, religious, and science beliefs. Judgment and Decision Making, 15(4), 476.
Rotello, C. M., Kelly, L. J., Heit, E., Vazire, S., & Vul, E. (2018). The Shape of ROC Curves in Shooter Tasks: Implications for Best Practices in Analysis. Collabra: Psychology, 4(1).
Trémolière, B., & Djeriouat, H. (2020). Don’t you see that its cold! Exploring the roles of cognitive reflection, climate science literacy, illusion of knowledge, and political orientation in climate change skepticism.
Wickham, H. (2014). Tidy data. Journal of Statistical Software, 59(10), 1-23.
---
title: "R for Cognitive Psychologists"
author:
  - name: "Omid Ghasemi"
    affiliation: Macquarie University
    email: omidreza.ghasemi@hdr.mq.edu.au
  - name: "Mahdi Mazidi"
    affiliation: University of Western Australia
    email: mahdi.mazidisharafabadi@research.uwa.edu.au
date: "`r format(Sys.time(), '%d %B, %Y')`"
output: 
  html_document:
    keep_md: yes
    number_sections: true
    theme: cerulean
    code_download: true
    #code_folding: hide
    toc: true
    toc_float: true
    df_print: "kable"
---

This document is the summary of the **R for Cognitive Psychologists** workshop. 

All correspondence related to this document should be addressed to: 

<center>
Omid Ghasemi (Macquarie University, Sydney, NSW, 2109, AUSTRALIA) 

Email: omidreza.ghasemi@hdr.mq.edu.au 
</center>



<style>

body{ /* Normal  */
      font-size: 18px;
      text-align: justify;
      line-height: 1.6;
      font-family: "Times New Roman", Times, serif;
}
code.r{ /* Code block */
    font-size: 14px;
}
pre { /* Code block - determines code spacing between lines */
    font-size: 12px;
}

</style>


```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(fig.align="center")
```



```{r libraries, message=FALSE, echo=F}
# load libraries
library(tidyverse)
library(here)
library(janitor)
library(broom)
library(afex)
library(emmeans)
library(knitr)
library(kableExtra)
library(ggsci)
library(patchwork)
library(skimr)
# install.packages("devtools")
# devtools::install_github("easystats/correlation")
library("correlation")
options(scipen=999) # turn off scientific notations
options(contrasts = c('contr.sum','contr.poly')) # set the contrast sum globally 
options(knitr.kable.NA = '')
```


# Research Question


The aim of the study is to test if simple arguments are more effective in belief revision than more complex arguments. To that end, we present participants with an imaginary scenario (two alien creatures on a planet) and a theory (one creature is predator and the other one is prey) and we ask them to rate the likelihood truth of the theory based on a simple fact (We adapted this method from Gregg et al.,2017; see the original study [here](https://journals.sagepub.com/doi/10.1080/17470218.2015.1099162)). Then, in a between-subject manipulation, participants will be presented with either 6 simple arguments (Modus Ponens conditionals) or 6 more complex arguments (Modus Tollens conditionals), and they will be asked to rate the likelihood truth of the initial theory on 7 stages. 

The first stage is the base rating stage. The next three stages include supportive arguments of the theory and the last three arguments include disproving arguments of the theory. We hypothesized that the group with simple arguments shows better persuasion (as it reflects in higher ratings for the supportive arguments) and better dissuasion (as it reflects in lower ratings for the opposing arguments).

In the last part of the study, participants will be asked to answer several cognitive capacity/style measures including CRT, AOT-E, mindware, and numeracy scales. We hypothesized that cognitive ability, cognitive style, and open-mindedness are positive predictors of persuasion and dissuasion. These associations should be more pronounced for participants in the group with complex arguments because the ability and willingness to engage in deliberative thinking may favor participants to assess the underlying logical structure of those arguments. However, for participants in the simple group, the logical structure of arguments is more evident, so participants with lower ability can still assess the logical status of those arguments.
 

```{r fig.align='center', echo=FALSE}
knitr::include_graphics(here('inputs','exp_design.png'))
```

Thus, our hypotheses for this experiment are as follows:

- Participants in the group with simple arguments have higher ratings for supportive arguments (They are more easily persuaded than those in the group with complex arguments).

- Participants in the group with simple arguments have lower ratings for opposing arguments (They are more easily dissuaded than those in the group with complex arguments).

- There are significant associations between CRT, AOT-E, Numeracy, and mindware with both persuasion and dissuasion indexes in each group and in the entire sample. The relationship between these measures should be stronger, although not significantly, for participants in the group with complex arguments.


```{r echo=FALSE, out.width="550px", out.height="400px"}
knitr::include_graphics(here('inputs','prediction_plot.png'))
```


# Getting Ready

First, we need to design the experiment. For this experiment, we use online platforms for data collection. There are several options such as Gorilla, JSpsych, Qualtrics, psychoJS (pavlovia), etc. Since we do not need any reaction time data, we simply use Qualtrics. For an overview of different lab-based and online platforms, see [here](https://omidghasemi21.github.io/human_data/Scripts/behavioral_data.html). 

Next, we need to decide on the number of participants (sample size). For this study, we do not sue power analysis since we cannot access more than 120 participants. However, it is highly suggested calculate sample size using power estimation. You can find some nice tutorials on how to do that [here](https://julianquandt.com/post/power-analysis-by-data-simulation-in-r-part-i/), [here](https://nickch-k.github.io/EconometricsSlides/Week_08/Power_Simulations.html), and [here](https://cran.r-project.org/web/packages/paramtest/vignettes/Simulating-Power.html).

After we created the experiment and decided on the sample size, the next step is to preresigter the study. However, it would be better to do a pilot with 4 or 5 participants, clean all the data, do the desired analysis, and then pre-register the analysis and those codes. You can find the preregistration form for the current study [here](https://osf.io/79r6e).

Finally, we need to restructure our project in a tidy folder with different sub-folders. Having a clean and tidy folder structure can save us! There are different formats of folder structure (for example, see [here](http://nikola.me/folder_structure.html) and [here](https://slides.com/djnavarro/workflow)), but for now, we use the following structure:

```{r echo=FALSE, out.width="700px", out.height="200px"}
knitr::include_graphics(here('inputs','folder_structure.png'))
```


# Introduction to R
```{r message=FALSE, eval=F}
# load libraries
library(tidyverse)
library(here)
library(janitor)
library(broom)
library(afex)
library(emmeans)
library(knitr)
library(kableExtra)
library(ggsci)
library(patchwork)
library(skimr)
# install.packages("devtools")
# devtools::install_github("easystats/correlation")
library("correlation")
options(scipen=999) # turn off scientific notations
options(contrasts = c('contr.sum','contr.poly')) # set the contrast sum globally 
options(knitr.kable.NA = '')
```

R can be used as a calculator. For mathematical purposes, be careful of the order in which R executes the commands.

```{r}
10 + 10

4 ^ 2

(250 / 500) * 100
```

R is a bit flexible with spacing (but no spacing in the name of variables and words)

```{r}
10+10

10                 +           10
```

R can sometimes tell that you're not finished yet

```{r eval=F}
10 +
```

How to create a *variable*? Variable assignment using `<-` and `=`. Note that R is case sensitive for everything

```{r}
pay <- 250

month = 12

pay * month

salary <- pay * month
```


Few points in naming variables and vectors: use short, informative words, keep same method (e.g., not using capital words, use only _ or . ).

## Function 
Function is a set of statements combined together to perform a specific task. When we use a block of code repeatedly, we can convert it to a function. To write a function, first, you need to *define* it:

```{r}
my_multiplier <- function(a,b){
  result = a * b
  return (result)
}
```

This code do nothing. To get a result, you need to *call* it:

```{r}
my_multiplier (2,4)
```

Fortunately, you do not need to write everything from scratch. R has lots of built-in functions that you can use:
```{r}
round(54.6787)
round(54.5787, digits = 2)
```

Use `?` before the function name to get some help. For example, `?round`. You will see many functions in the rest of the workshop.

## Basic Data Types in R:

function `class()` is used to show what is the type of a variable.


1. *Logical*: `TRUE`, `FALSE` can be abbreviated as `T`, `F`.  They has to be capital, 'true' is not a logical data:
```{r}
class(TRUE)
class(F)
```

2. *Numeric*: all numbers e.g. 5,  10.5,  11,37;  a special type of numeric is "integer" which is numbers without decimal. Integers are always numeric, but numeric is not always integer:
```{r}
class(2)
class(13.46)
```

3. *Character*: text for example, "I love R" or "4" or "4.5":
```{r}
class("ha ha ha ha")
class("56.6")
class("TRUE")
```

Can we change the type of data in a variable? Yes, you need to use the function `as.---()`

```{r}
as.numeric(TRUE)
as.character(4)
as.numeric("4.5")
as.numeric("Hello")
```


## Data Structures in R


**Vector**: when there are more than one number or letter stored. Use the combine function c() for that.

```{r}
sale <- c(1, 2, 3,4, 5, 6, 7, 8, 9, 10) # also sale <- c(1:10)

sale <- c(1:10)

sale * sale
```

*Subsetting a vector*:

```{r}
days <- c("Saturday", "Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday")

days[2]
days[-2]

days[c(2, 3, 4)]
```


### Exercise:

Create a vector named `my_vector` with numbers from 0 to 1000 in it:

```{r}
my_vector <- (0:1000)

mean(my_vector)
median(my_vector)
min(my_vector)
range(my_vector)
class(my_vector)
sum(my_vector)
sd(my_vector)
```

**List**: allows you to gather a variety of objects under one name (that is, the name of the list) in an ordered way. These objects can be matrices, vectors, data frames, even other list.

```{r}
my_list = list(sale, 1, 3, 4:7, "HELLO", "hello", FALSE)
my_list
```

**Factor**: Factors store the vector along with the distinct values of the elements in the vector as labels. The labels are always character irrespective of whether it is numeric or character. For example, variable gender with "male" and "female" entries:

```{r}
gender <- c("male", "male", "male", " female", "female", "female")
gender <- factor(gender)
```

R now treats gender as a nominal (categorical) variable: 1=female, 2=male internally (alphabetically).
```{r}
summary(gender)
```

*Question*: why when we ran the above function i.e. summary(), it showed three and not two levels of the data? *Hint*: run 'gender'.

```{r}
gender
```

So, be careful of spaces!

### Exercise:
Create a gender factor with 30 male and 40 females (*Hint*: use the `rep()` function):
```{r}
gender <- c(rep("male",30), rep("female", 40))
gender <- factor(gender)
gender
```

There are two types of categorical variables: nominal and ordinal. How to create ordered factors (when the variable is nominal and values can be ordered)? We should add two additional arguments to the `factor()` function: `ordered = TRUE`, and `levels = c("level1", "level2")`. For example, we have a vector that shows participants' education level.

```{r}
edu<-c(3,2,3,4,1,2,2,3,4)

education<-factor(edu, ordered = TRUE)
levels(education) <- c("Primary school","high school","College","Uni graduated")
education
```

### Exercise:
We have a factor with `patient` and `control` values. Here, the first level is control and the second level is patient. Change the order of levels, so patient would be the first level:

```{r}
health_status <- factor(c(rep('patient',5),rep('control',5)))
health_status

health_status_reordered <- factor(health_status, levels = c('patient','control'))
health_status_reordered
```

Finally, can you relabel both levels to uppercase characters? (*Hint*: check `?factor`)

```{r}
health_status_relabeled <- factor(health_status, levels = c('patient','control'), labels = c('Patient','Control'))
health_status_relabeled
```


**Matrices**: All columns in a matrix must have the same mode(numeric, character, etc.) and the same length. It can be created using a vector input to the matrix function.

```{r}
my_matrix = matrix(c(1,2,3,4,5,6,7,8,9), nrow = 3, ncol = 3)

my_matrix
```

**Data frames**: (two-dimensional objects) can hold numeric, character or logical values. Within a column all elements have the same data type, but different columns can be of different data type. Let's create a dataframe:

```{r}
id <- 1:200
group <- c(rep("Psychotherapy", 100), rep("Medication", 100))
response <- c(rnorm(100, mean = 30, sd = 5),
             rnorm(100, mean = 25, sd = 5))

my_dataframe <-data.frame(Patient = id,
                          Treatment = group,
                          Response = response)
```

We also could have done the below

```{r}
my_dataframe <-data.frame(Patient = c(1:200),
                          Treatment = c(rep("Psychotherapy", 100), rep("Medication", 100)),
                          Response = c(rnorm(100, mean = 30, sd = 5),
                                       rnorm(100, mean = 25, sd = 5)))
```

In large data sets, the function head() enables you to show the first observations of a data frames. Similarly, the function tail() prints out the last observations in your data set.

```{r eval=F}
head(my_dataframe) 
tail(my_dataframe)
```

```{r echo=F}
head(my_dataframe) %>%
  knitr::kable() %>%
  kable_styling(bootstrap_options = c("striped", "bordered", "condensed"), fixed_thead = T, full_width = T)
tail(my_dataframe)%>%
  knitr::kable() %>%
  kable_styling(bootstrap_options = c("striped", "bordered", "condensed"), fixed_thead = T, full_width = T)
```

Similar to vectors and matrices, brackets [] are used to selects data from rows and columns in data.frames:

```{r}
my_dataframe[35, 3]
```

### Exercise

How can we get all columns, but only for the first 10 participants?

```{r eval=F}
my_dataframe[1:10, ]
```

```{r echo=F}
knitr::kable(my_dataframe[1:10, ]) %>%
  kable_styling(bootstrap_options = c("striped", "bordered", "condensed"), fixed_thead = T, full_width = T)

```
How to get only the Response column for all participants?

```{r}
my_dataframe[ , 3]
```

Another easier way for selecting particular items is using their names that is more helpful than number of the rows in large data sets:
```{r eval=F}
my_dataframe[ , "Response"]
# OR:
my_dataframe$Response

```


# Data Cleaning

Now, suppose we tested 141 students. First, let's read and check the uncleaned data:
```{r message=F, warning=F, eval=F}
# read the raw data
raw_data <- read_csv(here("raw_data","raw_argumentative_exp1.csv"))
head(raw_data)
```

```{r message=F, warning=F, echo=F}
# read the raw data
raw_data <- read_csv(here("raw_data","raw_argumentative_exp1.csv"))

knitr::kable(head(raw_data)) %>%
  kable_styling(bootstrap_options = c("striped", "bordered", "condensed"), fixed_thead = T, full_width = F)%>%
  scroll_box(width = "780px")
```

Now, let's do some cleanining using `dplyr`, `tidyr` and other `tidyverse` libraries. Finally, we will check the data:
```{r message=F, warning=F, eval=F}
cleaned_data <- raw_data %>% 
  filter(progress == 100) %>% # filter out unfinished participants
  select(-end_date, -status,-ip_address, -duration_in_seconds, -recorded_date:-user_language) %>% #remove some useless columns
  mutate(aote_total= aote1+aote2+aote3+aote4+aote5+aote6+aote7+aote8, # create a total score for our questionnaire
         attentive= case_when(attention_correct >= 2~ 'Yes',T~ 'No')) %>%
  mutate(crt1= case_when(crt1=='4'~ 1,T~0),
         crt2= case_when(crt2=='10'~ 1,T~0),
         crt3= case_when(crt3=='39'~ 1,T~0),
         crt_total= crt1 + crt2 + crt3) %>%
  select(-attention1:-aote8) %>%
  pivot_longer(cols = c(stage1_simple:stage7_simple,stage1_complex:stage7_complex),names_to = 'stage',values_to = 'truth_estimate') %>% # make our dataframe long
  #pivot_wider(names_from = stage, values_from= truth_estimate) # this code change our dataframe back to wide
  filter(!is.na(truth_estimate)) %>% #remove rows with truth_estimate == NA
  mutate(stage= gsub("_.*", "", stage)) %>%
  rename(consent= consent_form) %>% # rename a column
  #mutate_if(is.character, factor) %>%
  mutate(subject= factor(subject), # convert all characters to factor
         group = factor(group),
         attentive = factor(attentive),
         stage = factor(stage))
```


```{r message=F, warning=F, echo=F}
cleaned_data <- raw_data %>% 
  filter(progress == 100) %>% # filter out unfinished participants
  select(-end_date, -status,-ip_address, -duration_in_seconds, -recorded_date:-user_language) %>% #remove some useless columns
  mutate(aote_total= aote1+aote2+aote3+aote4+aote5+aote6+aote7+aote8, # create a total score for our questionnaire
         attentive= case_when(attention_correct >= 2~ 'Yes',T~ 'No')) %>%
  mutate(crt1= case_when(crt1=='4'~ 1,T~0),
         crt2= case_when(crt2=='10'~ 1,T~0),
         crt3= case_when(crt3=='39'~ 1,T~0),
         crt_total= crt1 + crt2 + crt3) %>%
  select(-attention1:-aote8) %>%
  pivot_longer(cols = c(stage1_simple:stage7_simple,stage1_complex:stage7_complex),names_to = 'stage',values_to = 'truth_estimate') %>% # make our dataframe long
  #pivot_wider(names_from = stage, values_from= truth_estimate) # this code change our dataframe back to wide
  filter(!is.na(truth_estimate)) %>% #remove rows with truth_estimate == NA
  mutate(stage= gsub("_.*", "", stage)) %>%
  rename(consent= consent_form) %>% # rename a column
  #mutate_if(is.character, factor) %>%
  mutate(subject= factor(subject), # convert all characters to factor
         group = factor(group),
         attentive = factor(attentive),
         stage = factor(stage))

knitr::kable(head(cleaned_data)) %>%
  kable_styling(bootstrap_options = c("striped", "bordered", "condensed"), fixed_thead = T, full_width = F)%>%
  scroll_box(width = "780px")
```

Ok, now the data is clean and tidy which means:

> 1. Each variable forms a column.
2. Each observation forms a row.
3. Each type of observational unit forms a table ([Wickham](https://vita.had.co.nz/papers/tidy-data.pdf), 2014).

Check the dataframe and all the data types:
```{r}
str(cleaned_data)
```

Finally, we save our data to the `cleaned_data` folder.

```{r}
write_csv(cleaned_data, here("cleaned_data","argumentative_exp1.csv"))
```

# Descriptive Statistics

> Note: All the data that we use here is manipulated (fabricated) for teaching purpuses. In our study, we failed to find such beautiful and interesting results.

Now, let's do some descriptive statistics. First, we can open a new script called `analysis_exp1.r` and read the cleaned data again. 

```{r message=F, warning=F,}
data_exp1 <- read_csv(here("cleaned_data","argumentative_exp1.csv"))
```

How many participants in total?

```{r message=F, warning=F, eval=F}
data_exp1 %>% summarise(n= n_distinct(subject))
```


```{r message=F, warning=F, echo=F}
data_exp1 %>% summarise(n= n_distinct(subject))%>%
  knitr::kable() %>%
  kable_styling(bootstrap_options = c("striped", "bordered", "condensed"), full_width = F)
```

how many participants in each group?
```{r message=F, warning=F, eval=F}
data_exp1 %>% 
  group_by(subject) %>% 
  filter(row_number()==1) %>% 
  ungroup () %>% 
  group_by(group) %>% 
  count() 
```

```{r message=F, warning=F, echo=F}
data_exp1 %>% group_by(subject) %>% filter(row_number()==1) %>% ungroup () %>% group_by(group) %>% count() %>%
  knitr::kable() %>%
  kable_styling(bootstrap_options = c("striped", "bordered", "condensed"), fixed_thead = T)
```

Find the mean and sd for numeric variables using base R `summary` function:

```{r}
data_exp1 %>% 
  group_by(subject) %>% 
  filter(row_number()==1) %>% 
  ungroup () %>%
  summary()
```

Alternatively, we can use `base R `summary` function`skimr` library:
```{r eval=F}
data_exp1 %>% 
  group_by(subject) %>% 
  filter(row_number()==1) %>% 
  ungroup () %>% 
  dplyr::select (age, numeracy_total, mindware_total, aote_total, crt_total) %>% 
  skimr::skim()
```

```{r echo=F}
data_exp1 %>% 
  group_by(subject) %>% 
  filter(row_number()==1) %>% 
  ungroup () %>% 
  dplyr::select (age, numeracy_total, mindware_total, aote_total, crt_total) %>% 
  skimr::skim() %>%
  knitr::kable() %>%
  kable_styling(bootstrap_options = c("striped", "bordered", "condensed"), fixed_thead = T, full_width = F)%>%
  scroll_box(width = "780px")
```


### Exercise

For this exercise, we use a dataset of one of my own studies. In this study, we asked participants to guess the physical brightness of reasoning arguments and then we gave a cognitive ability test. (See the original study [here](https://osf.io/ebxnf/)). Open `ghasemi_brightness_exp4.csv` file and answer to the following questions:

1. How many participants did we test in total?
2. Find out how many male and female we tested.
3. Calculate mean and sd for age and cognitive ability (`ah4`).


```{r warning=F, message=F}
ghasemi_data <- read_csv(here("cleaned_data","ghasemi_brightness_exp4.csv"))

ghasemi_data %>% summarise(n = n_distinct(participant)) # number of participants:200

ghasemi_data %>% group_by (participant) %>% filter (row_number()==1) %>% group_by (gender) %>% summarise(n= n()) %>% ungroup() # 183 female, 17 male

ghasemi_data %>% dplyr::select (age, ah4) %>% skimr::skim() # mean and sd for age and cognitive ability
```


# Data Visualization

First, we need to create a dataset with aggregated `truth estimate` scores over `group` and `stage`. We will use this dataset for line and bar graphs.

```{r message=F, warning=F, dpi= 300, fig.height=3, fig.width=5}

aggregated_data_exp1 <- data_exp1 %>%
  group_by(stage, group) %>%
  mutate(truth_estimate = mean(truth_estimate)) %>%
  ungroup()

barplot_exp1 <- aggregated_data_exp1 %>%
  ggplot(aes(x=stage, y= truth_estimate, fill=group)) +
  geom_bar(stat = "identity", position= "dodge")+
  # stat_summary(fun= mean, geom = "bar", position = "dodge")+ # can be used instead of geom_bar() for long dataframes
  labs (x= '', y= "Truth Likelihhod Estimate") + 
  theme_bw() + 
  scale_fill_jama() 

barplot_exp1


barplot_facet_exp1 <- aggregated_data_exp1 %>%
  ggplot(aes(x=group, y= truth_estimate, fill=stage)) +
  geom_bar(stat = "identity", position= "dodge")+
  labs (x= '', y= "Truth Likelihhod Estimate") + 
  theme_bw() + 
  theme(legend.position = "none",
        axis.text=element_text(size=11),
        axis.title = element_text(size = 12)) +
  facet_wrap(~stage)+
  scale_fill_jco() 

barplot_facet_exp1


lineplot_exp1 <- aggregated_data_exp1 %>%
  ggplot(aes(x=factor(stage), y= truth_estimate, group= group, color= group)) +
  geom_line(aes(linetype= group)) +
  geom_point(size= 5)+
  labs (x= '', y= "Truth Likelihhod Estimate") + 
  theme_classic() +
  theme(legend.position = "bottom",
        axis.text=element_text(size=11),
        axis.title = element_text(size = 12)) +
  scale_color_nejm() 

lineplot_exp1


violinplot_exp1 <- data_exp1 %>%
  ggplot(aes(x=factor(stage), y= truth_estimate, fill= group)) +
  geom_violin()+
  labs (x= '', y= "Truth Likelihhod Estimate") + 
  theme_bw() + 
  theme(legend.position = "bottom",
        axis.text=element_text(size=11),
        axis.title = element_text(size = 12)) +
  scale_fill_d3() 

violinplot_exp1


boxplot_exp1 <- data_exp1 %>%
  ggplot(aes(x=factor(stage), y= truth_estimate, fill= group)) +
  geom_boxplot()+
  #geom_point(position = position_dodge(width=0.75), alpha= .5)+
  labs (x= '', y= "Truth Likelihhod Estimate") + 
  theme_bw() + 
  theme(legend.position = "bottom",
        axis.text=element_text(size=11),
        axis.title = element_text(size = 12)) +
  scale_fill_simpsons() 

boxplot_exp1


boxplot_facet_exp1 <- data_exp1 %>%
  ggplot(aes(x=factor(stage), y= truth_estimate, fill= group)) +
  geom_boxplot()+
  labs (x= '', y= "Truth Likelihhod Estimate") + 
  theme_bw() + 
  theme(legend.position = "bottom",
        axis.text=element_text(size=11),
        axis.title = element_text(size = 12),
        axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  facet_wrap(~group)+
  scale_color_simpsons() 

boxplot_facet_exp1


```

How to combine multiple plots? We can use the `patchwork` package. A nice tutorial on using this package can be found [here](https://patchwork.data-imaginist.com/articles/patchwork.html)
```{r dpi= 300, fig.height=7, fig.width=9}

combined_plot_exp1 <- (barplot_facet_exp1+lineplot_exp1) / (violinplot_exp1+boxplot_exp1)
combined_plot_exp1
```

How to save a plot?
```{rmessage=F}
ggsave(combined_plot_exp1, filename = here("outputs","combined_plot_exp1.png"), dpi=300)
```

# Data Analysis


## t-test

Is there a difference between groups at the first stage? Ideally, we want participants' ratings at the first stage be similar for both groups because we have not done any manipulations. Previous graphs showed us that ratings of simple and complex group at this stage are pretty close. Let's test that using an **independent t-test** (because we have 2 independent groups):

```{r}
# Is there a difference between groups at the first stage?
data_exp1 %>% 
  group_by(group) %>% 
  filter(stage=='stage1') %>% 
  ungroup () %>%
  t.test(truth_estimate~group, data = ., paired=FALSE)
```

Now, we wonder if opposing arguments were effective at all, regardless of participants' group. So, we would like to test if ratings at the final stage are lower than ratings at the stage 4? Since a pair of score at stage 4 and stage 7 is coming from a same person, we use **paired t-test**.

```{r}
# Is there a difference between ratings of stage4 and stage7?
data_exp1 %>% 
  filter(stage=='stage4' | stage=='stage7') %>% 
  ungroup () %>%
  t.test(truth_estimate~stage, data = ., paired=TRUE)
```


### Exercise

John et al. (2019) investigated the consequences of backing down (changing one's mind in lights of evidence)and how other people view someone who change their mind. In their second experiments, they presented participants either with a person who changes their mind or a person who refuses to back down. Then, they asked participants to rate how intelligent and confident the person is (See the original study [here](https://www.hbs.edu/faculty/Publication%20Files/John%20et%20al%20-%20self-presentational%20consequences_b85b2c43-a5b5-474c-9e2c-e9853b10727e.pdf)). They reported that: 

> "Relative to the entrepreneur who did not back down, participants judged the entrepreneur who backed down as more intelligent (M_backed_down=5.13 out of 7, SD=1.09; M_did_not_back_down=3.97, SD=1.54; t(271.12)=−7.59, p < .001) but less confident (M_backed_down=4.50 out of 7, SD=1.36; M_did_not_back_down=5.65, SD=1.10; t(291.01)=8.08, p < .001).".

Open the `john_backdown_exp2.csv` file and try to reproduce their results. Run two separate independent t-test, one with `intelligent` as the dependent variable and one with `confident` as the dependent variable. For both t-test, use `back_down` as the between-subject independent variable.

```{r message=F, warning=F}
john_data <- read_csv(here("cleaned_data","john_backdown_exp2.csv"))


t.test(intelligent~back_down, data = john_data, paired=FALSE)
t.test(confident~back_down, data = john_data, paired=FALSE)
```


## Analysis of Variance (ANOVA)

Now, let's answer our main question: Do participants in the simple group show higher ratings for supportive arguments (stage 2 to 4) and lower ratings for opposing arguments (stage 5 to 7), compared to participants in the complex group? If this is the case. we expect an interaction in the traditional **Analysis of Variance (AONVA)** test.

```{r message=F, warning=F}
aov_m1 <- aov_car (truth_estimate ~ group*stage +
                     Error(subject/stage), data = data_exp1)
```

```{r echo=F}
knitr::kable(nice(aov_m1)) %>%
  kable_styling(bootstrap_options = c("striped", "bordered", "condensed"), fixed_thead = T, full_width = T)
```

As you can see, we found a significant main effect of stage and a significant group by stage interaction. We can use the `emmeans` package to do post-hoc tests.

```{r warning=F, message=F}
# main effect of stage
emmeans(aov_m1, 'stage')
pairs(emmeans(aov_m1, 'stage'), adjust= 'holm')
```


```{r warning=F, message=F}
# group by stage interaction
emmeans(aov_m1, "group", by= "stage")
update(pairs(emmeans(aov_m1, "group", by= "stage")), by = NULL, adjust = "holm") 
```

You can use the `afex_plot` function from afex to create beautiful plots. Those plots interacts nicely with ggplot:
```{r message=F, warning=F, dpi= 300}
afex_plot(aov_m1, x = "stage", trace = "group", error='between',
          line_arg = list(size=1),
          point_arg = list(size=3.5),
          data_arg = list(size= 1, color= 'grey', width=.4),
          data_geom = geom_boxplot,
          mapping = c("linetype", "shape", "fill"),
          legend_title = "Group") +
  labs(y = "Truth Likelihhod Estimate", x = "") +
  theme_bw()+ # remove the grey background and grid
  theme(axis.text=element_text(size=13),
        axis.title = element_text(size = 13),
        legend.text=element_text(size=13),
        legend.title=element_text(size=13),
        legend.position='bottom',
        legend.key.size = unit(1, "cm"),
        legend.background = element_rect(colour = 'black', fill = 'white', linetype='solid'))+
  scale_color_simpsons() +
  scale_fill_simpsons()
```


If you are interested in this topic, check out this nice tutorial about [using afex to run ANOVA](https://cran.r-project.org/web/packages/afex/vignettes/afex_anova_example.html), and also this interesting tutorial on the [emmeans package](https://aosmith.rbind.io/2019/03/25/getting-started-with-emmeans/).

### Exercise

Rotello et al. (2018) investigated the association between the race (White vs. Black faces) and the gun-tool judgments. In their first experiments, they presented participants with 16 White male faces and 16 Black male faces, and following that 8 images of guns and 8 images of tools. They asked participants to judge if the object is a tool or a gun by pressing keyboard buttons. Then, they ran an ANOVA to see if participants' gun responses are higher for any of the races. So, they included prime race (Black, White) and target identity (gun, tool) as independent variables and participants' gun responses as dependent variable into their linear model (See the original study [here](https://psyarxiv.com/a7k96)). They found that: 

> "Participants made more gun responses to guns than to tools, F(1,45) = 53243, p < 0.0001, η2g = 0.998. However, the race of the prime face did not matter, F(1,45) = 0.287, p > 0.59, η2g = 0.001, nor was there an interaction of prime race with target object, F(1,45) = 0.022, p > 0.88, η2g = 0.000)".

Open the `rotello_shooter_exp1.csv` file and try to reproduce their results. Run an ANOVA (type III) with `resp` as the dependent variable and target, prime, and their interaction as independent variables.


```{r message=F, warning=F}
# load the general data file
rotello_data <- read_csv(here("cleaned_data","rotello_shooter_exp1.csv"))

# ANOVA
rotello_aov <- aov_car (resp ~ target*prime +
           Error(subject/target*prime), data = rotello_data)
```

```{r echo=F}
knitr::kable(nice(rotello_aov)) %>%
  kable_styling(bootstrap_options = c("striped", "bordered", "condensed"), fixed_thead = T, full_width = T)
```



## Correlation

Now, let's answer to another question of this study: does persuasion and dissuasion is related to open-mindedness, cognitive ability, reasoning abilities, and cognitive style? To answer this question, we need to create two indexes (scores) one for persuasion and one for dissuasion. Then we can do a correlation test:

```{r message=F, eval=F, fig.align='center', dpi=300}
cor_data_exp1 <- data_exp1 %>% 
  pivot_wider(names_from = stage, values_from = truth_estimate) %>%
  group_by(subject) %>%
  mutate(persuasion_index= stage2+ stage3+ stage4 - stage1,
         dissuasion_index= (101-stage5) + (101-stage6) + (101-stage7) - (101-stage4)) %>%
  ungroup()%>%
  dplyr::select(persuasion_index,dissuasion_index,aote_total,numeracy_total,crt_total,mindware_total)

#---------- Base R:
cor(cor_data_exp1, method = "pearson",  use = "complete.obs")

#---------- Psych library:
cor_data_exp1 %>% 
  psych::pairs.panels(method = "pearson", hist.col = "#00AFBB", density = T, ellipses = F, stars = T)

#---------- Correlation library:
correlation::correlation(cor_data_exp1) %>% summary()

#---------- apaTables library:
cor_data_exp1 %>% 
  apaTables::apa.cor.table(filename="./outputs/CorMatrix.doc", show.conf.interval=T)
```

```{r message=F, echo=F, fig.align='center', dpi=300}
cor_data_exp1 <- data_exp1 %>% 
  pivot_wider(names_from = stage, values_from = truth_estimate) %>%
  group_by(subject) %>%
  mutate(persuasion_index= stage2+ stage3+ stage4 - stage1,
         dissuasion_index= (101-stage5) + (101-stage6) + (101-stage7) - (101-stage4)) %>%
  ungroup()%>%
  dplyr::select(persuasion_index,dissuasion_index,aote_total,numeracy_total,crt_total,mindware_total)

#---------- Base R:
cor(cor_data_exp1, method = "pearson",  use = "complete.obs")%>%
  knitr::kable(digits = 2) %>%
  kable_styling(bootstrap_options = c("striped", "bordered", "condensed"), fixed_thead = T, full_width = F)%>%
  scroll_box(width = "780px")

#---------- Psych library:
cor_data_exp1 %>% 
  psych::pairs.panels(method = "pearson", hist.col = "#00AFBB", density = T, ellipses = F, stars = T)

#---------- Correlation library:
correlation::correlation(cor_data_exp1) %>% summary()%>%
  knitr::kable(digits = 2) %>%
  kable_styling(bootstrap_options = c("striped", "bordered", "condensed"), fixed_thead = T, full_width = F)%>%
  scroll_box(width = "780px")

```



### Exercise

Pennycook et al. (2020) investigated the relationship between actively open-minded thinking style about evidence (AOT-E) and different political, scientific, and religious beliefs (see the original paper [here](https://psyarxiv.com/a7k96)). In their first experiment, they calculated the correlation of AOTE and scientific beliefs items (global warming, evolution, etc.) and they found the following results:

```{r echo=FALSE, out.width="700px", out.height="350px", fig.cap= "adapted from [Pennycook et al. (2020)](https://psyarxiv.com/a7k96)"}
knitr::include_graphics(here('inputs','pennycook_corr.png'))
```

Open the `pennycook_aote_exp1.csv` file and try to reproduce their results by creating the same correlation matrix.

```{r message=F, eval=F}
pennycook_data <- read_csv(here("cleaned_data","pennycook_aote_exp1.csv")) 


#---------- Base R:
cor(pennycook_data, method = "pearson",  use = "complete.obs")

#---------- Psych library:
pennycook_data %>% 
  psych::pairs.panels(method = "pearson", hist.col = "#00AFBB", density = T, ellipses = F, stars = T)

#---------- Correlation library:
correlation::correlation(pennycook_data) %>% summary()

#---------- apaTables library:
pennycook_data %>% 
  apaTables::apa.cor.table(filename="./outputs/CorMatrix.doc", show.conf.interval=T)
```


```{r message=F, eval=T, echo=F, fig.align='center', dpi=300}
pennycook_data <- read_csv(here("cleaned_data","pennycook_aote_exp1.csv")) %>%
  clean_names()

correlation::correlation(pennycook_data) %>% summary() %>%
  knitr::kable(digits = 2) %>%
  kable_styling(bootstrap_options = c("striped", "bordered", "condensed"), fixed_thead = T, full_width = F)%>%
  scroll_box(width = "780px")

```


## Linear Regression

In the previous section, we found that open-mindedness (AOT-E) is correlated with persuasion. Now, one may ask if open-mindedness can predict persuasion after controlling for reasoning and controlling abilities? To answer that, we can run a multiple regression analysis:
```{r}
exp1_reg=lm(persuasion_index ~ aote_total+ numeracy_total+ crt_total+ mindware_total,
                  data=cor_data_exp1)
```

```{r message=F, eval=T, echo=F, fig.align='center', dpi=300}
broom::tidy(exp1_reg)%>%
  knitr::kable(digits = 2) %>%
  kable_styling(bootstrap_options = c("striped", "bordered", "condensed"), fixed_thead = T, full_width = T)
```

### Exercise

Trémolière and Djeriouat (2020) examined the role of *cognitive reflection* and *belief in science* in climate change skepticism. In their first study, they revealed that cognitive reflection and belief in science negetively predicted climate change skepticism even after controlling for demographic and cognitive ability variables (see the original paper [here](https://psyarxiv.com/vp8k6/)). 

```{r echo=FALSE, out.width="700px", out.height="350px", fig.cap= "adapted from [Trémolière and Djeriouat (2020)](https://psyarxiv.com/vp8k6/)"}
knitr::include_graphics(here('inputs','tremoliere_reg.png'))
```

Open the `tremoliere_data_exp1.csv` file and try to reproduce their results by running a multiple linear regression. Enter age, gender, education, belief in science, literacy, numeracy (Numtotal), and cognitive reflection as predictors and enter climate change skepticism (climato) as the outcome variable.

```{r message=F}
Tremoliere_data <- read_csv(here("cleaned_data","tremoliere_data_exp1.csv"))

Tremoliere_reg=lm(Climato ~ Age+ Gender+ Education+ BeliefInSciencetotal+ Literacy+ Numtotal+ CognitiveReflection,
                    data=Tremoliere_data)
```


```{r message=F, eval=T, echo=F, fig.align='center', dpi=300}
broom::tidy(Tremoliere_reg)%>%
  knitr::kable(digits = 2) %>%
  kable_styling(bootstrap_options = c("striped", "bordered", "condensed"), fixed_thead = T, full_width = T)

glance(Tremoliere_reg)%>%
  knitr::kable(digits = 2) %>%
  kable_styling(bootstrap_options = c("striped", "bordered", "condensed"), fixed_thead = T, full_width = F)%>%
  scroll_box(width = "780px")
```


# Rmarkdown

To be completed...


# References

- Ghasemi, O., Handley, S., & Howarth, S. (2020). The Bright Homunculus in our Head: Individual Differences in Intuitive Sensitivity to Logical Validity.

- John, L. K., Jeong, M., Gino, F., & Huang, L. (2019). The self-presentational consequences of upholding one’s stance in spite of the evidence. Organizational Behavior and Human Decision Processes, 154, 1-14.

- Pennycook, G., Cheyne, J. A., Koehler, D. J., & Fugelsang, J. A. (2020). On the belief that beliefs should change according to evidence: Implications for conspiratorial, moral, paranormal, political, religious, and science beliefs. Judgment and Decision Making, 15(4), 476.

- Rotello, C. M., Kelly, L. J., Heit, E., Vazire, S., & Vul, E. (2018). The Shape of ROC Curves in Shooter Tasks: Implications for Best Practices in Analysis. Collabra: Psychology, 4(1).

- Trémolière, B., & Djeriouat, H. (2020). Don’t you see that its cold! Exploring the roles of cognitive reflection, climate science literacy, illusion of knowledge, and political orientation in climate change skepticism.

- Wickham, H. (2014). Tidy data. Journal of Statistical Software, 59(10), 1-23.